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Abstract 

We  present  an  efficient  network  algorithm  for  generating 
exact  permutational  distributions  for  linear  rank  tests 
defined  on  stratified  2 x c contingency  tables.  The  al- 
gorithm can  evaluate  exact  one  and  two  sided  p-values, 
and  compute  exact  confidence  intervals  for  trend  param- 
eters arising  from  certain  loglinear  and  logistic  models 
embedded  in  these  contingency  tables.  It  is  especially 
efficient  for  highly  imbalanced  categorical  data,  a situ- 
ation where  the  asymptotic  theory  is  unreliable.  Part 
of  the  algorithm  can  be  adapted  to  evaluating  the  con- 
ditional maximum  likelihood  and  its  derivatives  for  the 
logistic  regression  model,  with  grouped  data.  We  illus- 
trate the  techniques  with  an  analysis  of  two  data  sets;  the 
leukemia  data  on  the  Hiroshima  atomic  bomb  survivors, 
and  data  from  a clinical  trial  of  bone  marrow  transplant. 


1 Introduction 

Linear  rank  tests  play  a major  role  in  nonparametric  in- 
ference. The  Chernoff-Savage  theorem  (1958)  ensures 
the  asymptotic  normality  of  these  tests,  and  indeed,  for 
continuous  data  the  asymptotic  results  work  very  well. 
By  the  time  the  sample  size  is  around  30,  there  is  very 
little  difference  between  the  asymptotic  distribution  of 
a linear  rank  test  statistic  and  its  exact  permutational 
distribution.  However  this  is  not  the  case  for  categorical 
data.  Here  the  rate  of  convergence  to  asymptotic  normal- 
ity depends  on  more  than  just  sample  size.  The  number 
of  ties  in  each  category,  the  group  imbalance,  and  the 
choice  of  rank  scores,  all  affect  the  shape  of  the  permuta- 
tion distribution  in  complicated  ways,  making  it  difficult 
to  predict  a priori  whether  the  asymptotic  results  for  a 
given  data  set  are  reliable.  It  is  important  therefore  to 


develop  efficient  numerical  algorithms  to  supplement  ex- 
isting asymptotic  results  for  the  categorical  case.  These 
algorithms  serve  both  the  data  analyst  concerned  about 
the  validity  of  the  inference  in  small,  sparse,  or  imbal- 
anced data  sets,  and  the  theoretical  statistician  develop- 
ing new  asymptotic  methods  and  wishing  to  confirm  that 
the  theory  is  accurate. 

This  paper  develops  a very  fast  algorithm  for  generating 
exact  permutation  distributions  for  linear  rank  tests  de- 
fined on  stratified  2 x c contingency  tables.  The  permuta- 
tional problem  is  formulated  very  precisely  in  Section  2. 
A network  algorithm  for  solving  the  problem  is  presented 
in  Section  3.  A major  strength  of  the  algorithm  is  that  its 
limits  of  computational  feasibility  increase  with  the  de- 
gree of  imbalance  between  the  groups  being  compared. 
This  is  precisely  where  it  is  needed  most,  since  the  re- 
liability of  asymptotic  results  decrease  as  the  imbalance 
increases.  In  another  paper  we  analyze  some  case-control 
data  in  which  the  total  sample  size  is  99,960.  Yet,  be- 
cause of  the  severe  imbalance  between  cases  and  controls, 
the  asymptotic  results  differ  from  the  exact  ones.  The 
algorithm  developed  here  performs  exact  permutational 
inference  on  the  data  set  with  no  difficulty  whatsoever, 
despite  its  enormous  sample  size. 

The  inference  techniques  discussed  in  this  paper  are  con- 
ditional. This  is  true  both  for  the  exact  as  well  as  the 
asymptotic  inference.  Exact  methods  for  parameter  es- 
timation naturally  require  strong  numerical  algorithms. 
But  it  is  not  generally  recognized  that  conditional  infer- 
ence places  a heavy  computational  burden  on  the  maxi- 
mum likelihood  estimation  as  well.  A by-product  of  the 
algorithmic  development  in  Section  3 is  its  applicability 
to  the  problem  of  estimating  model  parameters  by  max- 
imizing a conditional  likelihood  function  and  evaluating 
its  first  two  derivatives.  Without  our  algorithm,  evaluat- 
ing the  conditional  likelihood,  even  though  it  only  yields 
asymptotic  estimates,  would  be  almost  as  difficult  as  the 
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exact  inference. 


2 Statistical  Formulation 


In  this  section  we  formulate  a general  permutation  prob- 
lem whose  solution  will  make  exact  statistical  inference 
possible  for  a rich  class  of  linear  rank  tests,  defined  on  or- 
dered categorical  or  binary  data.  The  computational  dif- 
ficulties encountered  with  the  permutation  problem  are 
discussed,  setting  the  stage  for  the  development  of  an 
efficient  numerical  algorithm,  in  Section  3. 


2.1  Tabular  Representation  of  the  Data 

The  data  can  be  represented  as  a collection  of  s 2 X c 
contingency  table  consisting  of  2 rows,  c columns,  and 
s strata.  A specific  collection,  or  three  way  table,  of  this 
type,  denoted  by  x = (xi,X2, . . .x,),  is  displayed  below: 


| Stratum  1 

Rows 

Col-1 

Col-2  ... 

Col-c 

Row-Total 

Row.l 

*11 

X2\ 

Xcl 

mi 

Row-2 

x1 

*11 

x'n 

m? 

Col-Total 

nil 

nji 

«cl 

Ni 

Col-Score 

Wl 

W2 

We 

| Stratum  2 

Rows 

Col-1 

Col-2  ... 

Col-c 

Row-Total 

Row-1 

*12 

X22 

*c2 

mj 

Row.2 

X\7 

X21 

Xr2 

Col- Total 

nij 

n32 

ncj 

n2 

Col- Score 

W1 

W2 

1 Stratum  s | 

Rows 

Col.1 

Col-2  . . . 

Colx 

Row- Total 

Row.l 

X2» 

Xct 

m, 

Row.2 

*» . 

< 

Col-Total 

ni» 

n2, 

Tics 

N, 

Col-Score 

Wl 

W2 

xuc 

The  above  tabular  representation  accommodates  both 
the  comparison  of  two  multinomial  populations  and  the 
comparison  of  k binomial  populations.  In  either  case  we 
may  adjust  for  possible  covariate  effects  by  stratification. 
Unstratified  data  may  be  regarded  as  a special  case  with 
« = 1. 

Two  Multinomial  Populations  The  two  rows  of 
stratum  k represent  two  independent  multinomial 


populations.  Each  observation  falls  into  one  of  c or- 
dinal response  categories,  Thus  Xjk  is  the  number  of 
stratum  k observations,  out  of  a total  of  mk,  falling 
into  ordered  category  j for  population  1,  and  t is 
the  number  of  stratum  k observations,  out  of  a total 
of  m'k,  falling  into  ordered  category  j for  popula- 
tion 2.  The  stratum  invariant  scores,  w: , w2, . . . we, 
are  numerical  values  assigned  to  the  c ordered  multi- 
nomial response  categories. 

Several  Binomial  Populations  The  c columns  of 
stratum  k represent  c independent  binomial  popu- 
lations with  row  1 representing  successes  and  row  2 
representing  failures.  For  population  j and  stra- 
tum k there  are  Xjk  successes  and  t failures  in  rijk 
independent  Bernoulli  trials.  The  stratum  invari- 
ant scores,  u>i,tt>2,  ...u>c  typically  represent  doses, 
or  levels  of  exposure,  affecting  the  success  rates  of 
the  c binomial  populations. 


2.2  Exact  Conditional  Inference 

Define  the  reference  set  for  the  Fth  stratum,  T*,  as  all 
possible  2 x c contingency  tables  whose  row  and  column 
margins  are  fixed  at  the  corresponding  values  of  the  ob- 
served 2 x c table,  x*: 


r*  = {yt:  y*  is  2 x c;  yjk  + y'jk  = n;t,Vj; 

C C 

j=i  i-\ 

Define  the  full  reference  set  as  the  cartesian  product  of 
the  reference  sets  across  all  s strata: 


0 = Ti  x r2  x ...  x T,  = {y:  y*  € Tk,k  = 1,2, ...s  } 


The  test  statistic,  T,  is  defined  as  a sum  of  linear  rank 
statistics  over  the  s strata: 

T = T1+T2  + ...  + T,  , 

where  each  Tk  can  only  take  on  the  values  tk  of  the  form 

C 

tk  = 52wjVj k - 

;=i 

for  some  yk  € T* , and  a fixed  set  of  scores,  W\,W2, . . .wc. 
By  a suitable  choice  of  scores  one  can  obtain  a very  rich 
class  of  linear  rank  tests.  The  distribution  of  the  test 
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statistic,  T,  is  derived  by  limiting  the  sample  space  to 

£ € 0. 

Under  the  null  hypothesis  of  no  row  and  column  interac- 
tion the  conditional  probability  distribution  of  T*  given 
y*  € T*  is 


/*(<*)  = 


C) 


(2.1) 


where 


Im*  = {y*  € IV  ^wjyjk  = <*}  . 

;= i 

Then  by  convolution,  the  conditional  probability  distri- 
bution of  T,  given  y € 0,  is 

no = — rn  \ ■ M 

ra-(*) 

where 

« c 

0t  = {ye0:  = 0 • 

t=i j=i 

Notice  that  (2.2)  is  a sum  of  generalized  hypergeometric 
probabilities  and  is  free  of  all  unknown  parameters.  This 
enables  us  to  compute  exact  p-values  for  all  the  linear 
rank  tests  listed  above.  We  can  also  compute  the  first  two 
moments  of  T and  thereby  perform  asymptotic  inference 
by  appealing  to  the  Chernoff-Savage  theorem. 


2.3  Parameter  Estimation 

For  data  arising  from  two  multinomial  distributions  or  c 
binomial  distributions,  we  can  specify  loglinear  and  logis- 
tic models,  respectively,  for  the  data  generating  process. 
Let  TTj*  be  the  probability  that  a subject  from  stratum  k 
is  classified  as  falling  into  row  1 and  column  j.  Let 
be  the  probability  that  a subject  from  stratum  k is  clas- 
sified as  falling  into  row  2 and  column  j.  If  the  two 
rows  of  each  stratum  represent  data  from  two  multino- 
mial populations,  the  above  probabilities  must  satisfy  the 
constraints 

;= i ;=i 


for  Jb  = 1 , 2, . . . 8.  If  the  c columns  of  each  stratum  repre- 
sent data  from  c binomial  populations,  the  above  proba- 
bilities must  satisfy  the  constraints 

Trjk  + irjt  = 1 , 

for;  = 1,2, ...c,  and  Jk  = l,2,...s.  In  either  case  we 
assume  that  there  is  no  three-factor  interaction  so  that 
the  c - 1 odds  ratios 


~ f } 

*1  k*jk 

j = 2, 3, . . . c,  do  not  depend  on  Jb.  Next  we  model  these 
odds  ratios  as  a function  of  the  scores.  If  the  data  have 
been  generated  from  two  stratified  multinomial  popula- 
tions, it  is  natural  to  derive  the  odds  ratios  from  a log- 
linear  model  with  a linear  by  linear  row  times  column 
association  (Agresti,  1990,  page  275,  equation  (8.11)). 
In  the  present  context  the  linear  by  linear  model  speci- 
fies the  following  expected  cell  counts  on  the  logarithmic 
scale: 

log(mtTjt)  = ajk  + Put, 

for  row  1,  and 

I°gK*;t)  = an 

for  row  2. 


If  the  data  have  been  generated  from  c stratified  binomial 
populations  it  is  natural  to  derive  the  odds  ratios  from  a 
logistic  regression  model  (Cox,  1970): 

log  = ak  + 0Wj  . 

*jk 

Both  models  yield  the  relationship 


logtfy  =0{wj-Wi)  , (2.3) 

where  0 is  an  unknown  parameter  to  be  estimated  from 
the  data.  It  can  be  shown  that  T is  a sufficient  statistic 
for  0 under  both  the  linear  by  linear  association  model 
and  the  logistic  regression  model.  Moreover,  the  con- 
ditional distribution  of  T,  given  (y i , y2> ■••y»)  € 0,  de- 
pends only  on  0,  other  (nuisance)  parameters  being  elim- 
inated by  the  conditioning.  This  conditional  distribution 
is  given  by 

_ /(Qexp(fl) 

E»/(«)ex  P(0u)'  [ ’ 

where  the  denominator  of  equation  (2.4)  is  simply  the 
normalizing  constant  obtained  by  summing  over  all  pos- 
sible values  of  T.  When  0 = 0 we  obtain  the  null  distri- 
bution (2.2). 
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The  conditional  maximum  likelihood  estimate  (cmle)  of 
0 is  obtained  by  finding  the  value  of  0 that  maximizes 
the  conditional  probability  (2.4)  at  the  observed  value 
T = a o.  To  obtain  the  variance  of  the  cmle  we  need  the 
second  derivative  of  the  log  likelihood,  evaluated  at  the 
cmle.  Both  the  cmle  and  its  variance  may  be  rapidly 
evaluated  by  repeated  backward  induction  on  a network, 
as  discussed  in  detail  in  Section  3.  We  can  then  use 
these  estimates  to  perform  asymptotic  hypothesis  tests 
or  compute  asymptotic  confidence  intervals  for  0. 

To  obtain  an  exact  confidence  interval  for  0 we  need 
the  coefficients  f(t)  for  all  values  of  T in  the  tails  its 
distribution.  A network  algorithm  for  this  computation 
is  described  in  Section  3.  Once  these  coefficients  have 
been  computed,  the  conditional  tail  probabilities,  T > 
ao,  or  T < a0,  for  any  value  of  0 , may  be  derived  from 
equation  (2.4).  Exact  confidence  bounds  for  0 are  then 
obtained  by  inverting  corresponding  UMP  unbiased  tests 
for  0 , as  shown  in  Cox  (1970).  For  example,  a 100(1  - 
a)%  lower  confidence  bound  for  0,  say  0(ao),  would  be 
obtained  as  the  solution  to 

f mar 

£/(f|/?(a0))  = «.  (2.5) 

t=a0 

The  solution  to  equation  (2.5)  may  be  rapidly  evaluated 
by  a simple  binary  search  because,  as  shown  in  (2.4),  f(t) 
and  0 are  separable  in  the  expression  for  f(t\0). 

2.4  Computational  Issues 

From  the  above  discussion  it  is  clear  that  a broad  class 
of  exact  linear  rank  tests  and  parameter  estimates  can 
be  obtained  if  we  are  able  to  compute  truncated  distri- 
butions of  the  form 

fi  = {(<,/(<)):  f>ao}  . (2.6) 

Exhaustive  enumeration  of  all  the  tables  in  0 for  gen- 
erating Q would  be  computationally  explosive.  Con- 
sider the  simple  case  of  a single  stratum,  no  ties,  and 
m = m'  = N/2.  The  number  of  tables  in  the  reference 
set  0 for  various  values  of  N is 


Sample  Size  ( N ) 

Tables  in  Reference  Set  (r) 

20 

1.8  x 105 

30 

1.5  x 108 

40 

1.4  x 1011 

50 

1.3  x 1014 

100 

1.0  x 1029 

If  there  were  s strata,  the  size  of  the  corresponding  ref- 
erence set  would  be  raised  to  the  sth  power.  It  is  clear 
that  even  in  the  very  powerful  computing  environment 
available  today,  explicit  enumeration  of  all  the  tables  in 
the  reference  set  0 rapidly  becomes  computationally  in- 
feasible. However  much  recent  research,  for  example, 
Mehtaet.  al.  (1984)  (1985)  (1988),  Pagano  and  Tritchler 
(1983),  Tritchler  (1984),  Streitberg  and  Rohmel  (1986), 
and  Hollander  and  Pena  (1988),  has  focused  on  implicit 
enumeration  of  the  tables  in  0,  thereby  considerably  ex- 
tending the  size  of  problem  for  which  exact  inference  is 
possible. 

Mehta,  Patel  and  Tsiatis  (1984),  and  Mehta,  Patel  and 
Wei  (1988),  developed  a network  algorithm  for  implicit 
enumeration  of  all  the  2 x c contingency  tables  in  the  ref- 
erence set  T,  defined  for  a single  stratum  (s  = 1).  Mehta, 
Patel  and  Gray  (1985)  developed  a network  algorithm  for 
implicit  enumeration  of  s 2 x 2 contingency  tables  (where 
s > 1).  The  present  paper  generalizes  the  earlier  work 
to  8 independent  2 x c contingency  tables,  a considerably 
more  difficult  problem.  An  alternative  method  would  be 
to  treat  the  s 2 x c problem  as  a special  case  of  condi- 
tional logistic  regression  and  directly  use  the  exact  algo- 
rithm of  Hirji,  Mehta  and  Patel  (1988).  However  that 
would  not  exploit  the  special  structure  of  the  problem 
in  the  way  that  the  present  algorithm  does.  We  conjec- 
ture that  the  algorithm  presented  here  is  the  fastest  one 
currently  available  for  categorical  data,  with  unequally 
spaced  Wj  scores.  In  another  paper  we  perform  exact  in- 
ference on  some  rather  large  data  sets,  to  illustrate  how 
powerful  the  algorithm  is,  and  to  set  up  a benchmark 
against  which  competing  algorithms  may  be  evaluated. 

A second  contribution  of  this  paper  is  to  provide  an  effi- 
cient numerical  algorithm  for  computing  the  cmle  for  0 
(equation  2.3)  and  its  standard  error.  A previous  algo- 
rithm for  this  problem,  in  the  more  general  conditional 
logistic  regression  setting,  was  developed  by  Gail,  Lu- 
bin,  and  Rubenstein  (1981).  Our  algorithm  is  equiva- 
lent to  theirs  for  data  with  no  ties,  but  is  considerably 
more  efficient  for  categorical  data.  In  another  paper,  we 
show  that  the  Gail  et.  al.,  algorithm,  as  implemented 
in  the  EGRET  (1988)  software  package,  is  unable  to 
compute  conditional  maximum  likelihood  estimates  for 
a large  heavily  tied  data  set,  whereas  our  algorithm,  ob- 
tains the  required  estimates  very  rapidly. 
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3 Numerical  Algorithms 

We  provide  numerical  algorithms  for  two  problems;  gen- 
erating the  truncated  permutation  distribution  0,  de- 
fined by  (2.6),  and  computing  the  cmle  for  /?,  say  /?, 
along  with  its  standard  error,  a.  Both  problems  are 
solved  within  one  unified  framework  wherein  the  refer- 
ence set  0 is  represented  as  a network.  We  will  see  that 
processing  the  network  in  the  forward  direction  yields 
fl,  while  processing  the  same  network  in  the  backward 
direction  yields  /?  and  its  standard  error. 

3.1  Generating  an  Overall  Truncated 
Permutation  Distribution 

Our  goal  is  to  generate  the  truncated  permutation  dis- 
tribution Q for  T,  the  sum  of  linear  rank  statistics  across 
all  the  strata.  Our  strategy  will  be  to  generate  s inde- 
pendent stratum  specific  truncated  permutation  distri- 
butions of  the  form 

n*  = {(/*,/*(<*)):  tk>ak}, 

at  the  cut-off  points 

aJfc  = a0  — ^ , 

i/k 

for,  k = 1, 2, ...  s.  Here  tkimax  is  the  maximum  value  of 
the  random  variable  T*,  and  is  easily  evaluated  as  part 
of  the  backward  induction  step  discussed  below.  We  will 
perform  pairwise  convolutions  on  these  stratum  specific 
distributions  until  the  overall  distribution  is  obtained. 
Thus  there  are  two  steps  to  be  performed  repeatedly; 
a distribution  generation  step,  and  a convolution  step. 
These  steps  are  described  next  in  separate  subsections. 

3.1.1  Generating  Stratum  Specific  Truncated 
Permutation  Distributions 

Suppose  we  wish  to  generate  the  truncated  permutation 
distribution  Qk,  for  the  fcth  stratum.  In  principle  this 
involves  enumerating  all  the  2 x c contingency  tables 
yt  € T*,  computing  the  value  of  tk  = Ylj=iwjyjk  for 
each  one,  and  summing  the  hypergeometric  probabili- 
ties of  all  the  tables  yk  € r*>l|k,  as  shown  in  (2.2).  We 
do  this  enumeration  implicitly  rather  than  explicitly,  by 
representing  the  reference  set  T*  as  a network  of  nodes 
and  arcs,  and  then  processing  the  network  in  a recursive 
stage- wise  fashion. 


Network  Representation  of  I1* 

The  network  representation  of  the  reference  set,  T*,  is 
constructed  in  c+ 1 stages  labelled  0, 1, ...  c,  where  stage 
j corresponds  to  the  jth  column  of  a typical  2 x c table 
in  IV  At  stage  j there  exist  a,  set  of  nodes  of  the  form 
(j,mjk),  where  each  mjk  = 5Zi=i  W*  corresponds  to  one 
distinct  partial  sum  of  the  first  j columns  of  the  tables 
yk  6 IV  Arcs  emanate  from  each  node  ( j,mjk ) and 
connect  it  to  successor  nodes  of  the  form  ( j + 1,  mJ+ 1,*). 
These  successor  nodes  may  be  specified  explicitly  as  the 
set 

e 

R(j,mjk)  = {(j  + l,m, -+!,*):  max(mjt,mt-  »/*) 

r-i+2 

< mj+i,k  < min (mjk  + n;+ i,t,  mk)}  . (3.7) 

Starting  at  stage  0 with  initial  node  (0,0),  and  apply- 
ing (3.7)  successively  to  the  nodes  at  stages  1,2, . . ,c  - 1, 
we  automatically  end  up  with  the  unique  terminal  node 
(c,mife).  In  this  construction  each  path,  or  sequence  of 
connected  arcs  of  the  form 

(0,0)  ► (1,  triiic)  >(c,mk)  (3.8) 

corresponds  to  one  and  only  one  table  yk  £ T*,  with 
j ijk  = mjk  - for  j = 1,2, ...c.  Thus  the  tables 

in  Ti  are  in  one-to-one  correspondence  with  the  paths 
through  the  network. 

To  complete  the  network  representation  we  assign  to 
each  arc 

{j  - 1 ,mj-ltk)->(j,mJk) 

a rank  length 

rjk  = Wj(mjk  - mj-iik) 
and  a probability  length 

»*  = ( m„ -m,-,..  <3-9> 

The  rank  length  of  a complete  path  of  the  form  (3.8)  con- 
necting the  initial  node  to  the  terminal  node  is  defined 
as  the  sum  of  rank  lengths  of  the  individual  arcs  consti- 
tuting that  path.  Its  probability  length  is  the  product 
of  probability  lengths  of  the  individual  arcs  constituting 
that  path.  The  distribution  of  Tk  is  then  the  same  as  the 
distribution  of  rank  lengths  of  all  the  paths  in  T*. 

Backward  Induction  on  r* 

We  can  obtain  much  useful  information  about  the  dis- 
tribution of  Tk  very  quickly,  by  a single  backward  pass 
through  the  network  T*.  At  any  node  (j,  mjk)  define 
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the  sub-network,  r^j,  m;t),  to  be  the  set  of  all  possi- 
ble paths  from  (j,mjk)  to  the  terminal  node  (c,m*).  In 
other  words  Tk(j,mjk)  consists  of  all  possible  values  of 
the  entries  in  columns  (j  + 1 , j + 2, . . . , c)  of  the  2 x c con- 
tingency tables  in  T*  whose  first  j columns  sum  to  mj k. 
Now  define  the  length  of  the  longest  path  in  T k(j,mjk) 
by 

e 

LP(j,mjk)  = max  { £ r,k)  , (3.10) 

the  length  of  the  shortest  path  in  r*, (j,  mjt)  by 

C 

SP(j,mjk)  = min  { £ r,k)  , (3.11) 

r*0,my»)  4* 

and  the  sum  of  probability  lengths  of  all  the  paths  in 
Tk(j,mjk)  by 

C 

TP(hmjk)=  £ M 

The  values  of  LP,  SP,  and  TP  can  be  rapidly  ob- 
tained by  backward  induction.  We  illustrate  how  this 
is  done  for  LP.  Set  LP(c,mk)  = 0.  Now  suppose  that 
LP(j  + l,mj+lit)  is  known  for  every  node  at  stage  j + 1. 
Move  backwards  to  stage  j,  select  a node  (j,mjik),  and 
compute 

LP{j,mjk)  = max  {rj+lii  + LP(j  + l,m,+i,*)}  . 

R-O'.mjk) 

(3.13) 

Repeat  this  process  for  every  node  at  stage  j and  then 
move  back  one  more  stage.  Proceeding  in  this  manner 
we  reach  stage  (0,0)  having  evaluated  the  LP  values  for 
all  the  nodes  of  the  network.  The  other  nodal  quantities 
may  be  obtained  similarly. 

Processing  in  the  Forward  Direction 
Starting  with  the  initial  node  (0,0),  we  process  the  net- 
work in  the  forward  direction,  stage  by  stage,  in  such  a 
way  that  by  the  time  we  reach  the  terminal  node,  (c,  m*), 
we  will  have  generated  the  desired  truncated  distribu- 
tion fit-  First  we  introduce  some  notation.  At  any  node 
(j,mjk)  define  the  sub-network,  Tk(j, mjk),  to  be  the 
set  of  all  possible  paths  from  the  starting  node  (0,0)  to 
(j,  mjk).  In  other  words,  Tk(j,mjk)  consists  of  all  pos- 
sible values  of  the  entries  in  columns  (l,2,...j)  of  the 
2 x c contingency  tables  in  T*  whose  first  j columns  sum 
to  mjk-  (Notice  that  this  set  differs  from  r*(j, rrijk), 
which  specifies  the  last  c - j + 1 columns  of  these  tables.) 
Denote  a generic  path, 

(0,0)  -»  (l,mu)  — * *(j,mJk) 


in  T k(j,mjk)  by  r.  The  rank  length  of  r is 

(=1 

and  its  probability  length  is 

i 

P(7‘)  = IIp't  • 

i=i 

There  will  typically  be  several  paths,  r € T *(j,m;*), 
each  having  the  same  rank  length,  r(r)  = u.  Let  c(u)  be 
the  sum  of  probability  lengths  of  all  these  paths.  That 

is. 

c(«)  = 53  p (r)  • 

{reTkO’.mjt):  r(r)=u} 

We  now  provide  a recursive  procedure  for  processing 
the  network  in  the  forward  direction.  Suppose  we  have 
reached  stage  j of  the  network  in  such  a way  that  at  each 
of  its  nodes,  ( j , mjk),  we  are  carrying  a set  of  records 

A(i,mjt)  = {(u,c(u)):  u = r(r),u  + LP{j,mjik)  > ak, 

reTk{j,mjk)}  . 

The  following  five-step  algorithm  is  used  to  update  these 
sets  and  thereby  move  forward  to  stage  j + 1. 

Step  1:  Select  a record  (u,c(u))  € A(j,m;*). 

Step  2:  Transmit  a copy  of  this  record  to  each  succes- 
sor node  ( j + l,m;+i,t),  where  the  successors  are 
identified  by  (3.7). 

Step  3:  At  each  successor  node,  ( j + l,nij+iik),  trans- 
form the  transmitted  record  to  (u*,c*),  where  u*  = 
u + ri+i,t.  and  c*  = c(«)P;+i,t- 

Step  4:  Insert  (u*,c*)  into  A(j  + l.mj+j,*)  as  follows: 

1.  If  u*  + LP(j+l,mj+\}k)  < akl  drop  this  record 
from  further  consideration,  and  go  to  Step  5. 
Otherwise  continue  with  the  insertion  as  de- 
scribed below.  (The  value  of  LP  is  available 
from  the  backward  induction  on  IV) 

2.  If  there  already  exists  a record  (u,c(u))  £ 
A (j  + l,mj+i,i)  such  that  u = u*,  then  merge 
the  two  records  by  replacing  ( u,c(u ))  with 
(u,c(u)  -|-  c*)  £ A (j  + l,mj+1)i). 

3.  If  no  record  currently  in  A(j  + l,mj+i,t)  has 
u = u*,  then  augment  A (j  + l,m;+i,t)  by 
adding  (u,c(u))  to  it,  as  a new  record. 
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The  technique  of  hashing  (Sedgewick  1983,  page 
201)  is  used  to  search  for  matches  and  either  merge 
or  augment  records  in  A^'+l.mj+i^).  This  ensures 
an  optimum  trade-off  between  efficient  use  of  avail- 
able memory  and  fast  search. 

Step  5:  Return  to  Step  1. 


The  above  5-step  algorithm  continues  until  every  record 
in  A(j,rtijk)  has  been  processed.  Then  another  node  at 
stage  j is  selected,  and  all  its  records  are  processed  in 
accordance  with  the  above  5 steps.  When  all  nodes  at 
stage  j have  been  exhausted,  repeat  Steps  1 through  5 
for  stage  j + 1.  Starting  with  A(0,0)  = {(0, 1)}  and  mov- 
ing through  stages  0, 1, . . . c — 1 by  repeatedly  carrying 
out  Steps  1 through  5,  we  process  the  entire  T*  network, 
ending  up  at  its  terminal  node  with  the  set  of  records 
A(c, mi).  These  records  are  really  the  same  as  the  de- 
sired truncated  probability  distribution  fl*,  except  that 
the  probability  lengths,  c(u),  have  to  be  normalized  by 
dividing  by  their  sum.  That  is, 


/*(<*)  = 


£uc(«) ' 


3.1.2  Pairwise  Convolution  of  the  Stratum  Spe- 
cific Truncated  Distributions 

We  restrict  our  discussion  to  the  convolution  of  fii  with 
^2-  The  resultant  distribution  may  be  convolved  with 
Q3  in  exactly  the  same  manner.  We  can  go  on  with  this 
pairwise  convolution  until  we  obtain  fi. 

First  sort  the  records  of  Di  in  ascending  order  of  <1 , and 
the  records  of  Q2  in  descending  order  of  <2-  Set  i = 1, 
j = 1.  Now  proceed  with  the  following  3-step  algorithm: 


Step  3:  Set  i = i + 1,  and  return  to  Step  1. 


There  are  many  ways  to  perform  the  convolution  at 
Step  2,  if  the  inequality  (3.14)  holds.  We  use  hashing 
to  club  records  having  the  same  value  of  t\  + <2-  The 
details  are  similar  to  Step  4.2  of  the  5-step  algorithm  for 
forward  processing  of  Tjt . A considerable  efficiency  gain 
is  achieved  because  we  need  not  consider  records  from 
Q2  located  at  positions  j or  below.  The  inequality  (3.14) 
ensures  that  they  can  never  contribute  to  the  final  set 
of  records  in  Q,  since  the  maximum  to  which  they  could 
be  augmented  is  less  than  oq.  This  is  analogous  to  the 
record  elimination  achieved  at  Step  4.1  of  the  5-step  al- 
gorithm for  forward  processing  of  ft . 


3.2  Evaluating  j3  and  its  Variance 

To  obtain  /?,  the  cmle  for  /?,  we  must  maximize  the  loga- 
rithm of  the  likelihood  (2.4).  Then  the  second  derivative 
of  the  log  likelihood,  evaluated  at  /?,  yields  the  desired 
variance.  But  direct  evaluation  of  the  log  likelihood  is 
not  an  easy  task,  given  the  complicated  expression  for 
the  denominator  of  (2.4).  In  fact  if  one  attempted  to 
evaluate  this  denominator  directly,  it  would  require  the 
enumeration  of  all  the  s 2 x c tables  in  0.  This  would 
make  the  asymptotic  inference  as  computationally  com- 
plex as  the  exact  inference.  Fortunately  there  is  an  easier 
approach  that  works  well  up  to  extremely  large  sample 
sizes.  Notice  that  the  denominator  of  (2.4)  is  the  same  as 
TP(0,0),  summed  over  all  the  strata.  We  can  easily  set 
up  recursions  like  (3.13)  for  TP,  its  first  derivative,  TP', 
and  its  second  derivative,  T P",  and  rapidly  evaluate  all 
three  quantities  during  the  backward  induction 

of  IV  For  example, 


Step  1:  Select  record  i from  flj.  Denote  it  by 

(t\ , fi(t\))-  Select  record  j from  Q?.  Denote  it  by 
(^2>  ^2(^2))' 

Step  2:  If 

J 

ij  + ^2  T Ik, max  ^ <J0  i 

k= 3 

set  j = j + 1,  and  return  to  Step  1.  But  if 

> 

1 1 + t{  + ^ tk,m ax  < 00  , (3-14) 

k—3 

convolve  record  i from  fii  with  each  of  the  first  j—  1 
records  from  ^2- 


TP'(j,mjk)=  X]  Pi+iATPU  + 1,mj+i,*)+ 

TP'(j  + l,mHlk)} 

It  is  easy  to  show  by  successive  differentiation  of  the  log- 
arithm of  (2.4)  that  the  second  derivative  of  the  contri- 
bution to  the  log  likelihood  of  the  fcth  stratum  is 

[rp(o,o)]-2[rp'(o,o)]2  - [rp(o,o)]-1[rp"(o,o)] 

(3.15) 

Evaluating  (3.15)  at  the  cmle  of  /?,  summing  across 
strata,  and  equating  the  resultant  second  derivative  to 
zero,  yields  the  desired  asymptotic  variance. 
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4 Concluding  Remarks 

The  following  technical  features  of  the  network  algorithm 
were  responsible  for  its  extraordinary  success: 

• The  network  representation  takes  advantage  of  the 
categorical  nature  of  the  data  by  requiring  only  as 
many  stages  as  there  are  discrete  categories. 

• The  number  of  nodes  in  the  T*  network  is  deter- 
mined min(m*,  m'k).  Thus  the  greater  the  imbalance 
between  the  two  row  sums,  the  smaller  the  network, 
and  the  easier  the  processing. 

• The  preliminary  backward  induction  pass  through 
the  network  provides  valuable  information  about 
the  ‘future’  for  each  stage  of  the  forward  process- 
ing. This  enables  us  to  generate  a truncated  per- 
mutation distribution  directly  at  the  forward  pass, 
rather  than  generating  the  full  permutation  distri- 
bution and  then  truncating  it  as  needed.  In  effect, 
substantially  fewer  records  are  carried  along  at  each 
stage  of  the  forward  pass,  as  records  not  satisfying 
the  LP  criterion  get  eliminated. 

• The  network  representation  enables  us  to  generate 
the  distribution  of  each  Tt  recursively  in  a stage- 
wise  forward  pass  through  the  network.  During  this 
forward  pass  paths  having  the  same  rank  length  up 
to  some  node  are  ‘clubbed’  together.  We  thus  deal 
only  with  paths  having  distinct  rank  lengths  up  to 
each  node,  rather  than  all  the  paths  up  to  that  par- 
ticular node. 

• The  backward  induction  step  enables  us  to  rapidly 
evaluate  the  denominator  of  (2.4),  and  its  first  and 
second  derivatives.  This  greatly  facilitates  the  con- 
ditional maximum  likelihood  inference. 
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